Normalized Maximum Likelihood Models with Memory for Genomics

نویسندگان

  • Ioan Tăbuş
  • Jaakko Astola
چکیده

This paper will review the recent applications of MDL techniques for genomic sequence analysis presented in detail in [12] and [13]. The normalized maximum likelihood (NML) model [5]-[8] for a class of Markov sources [13] was recently used for the compression of full genomes, obtaining for the human genome the best existing compression results [4]. We have shown that one of the underlying biological features implicitely uncovered by the compression algorithm is the existence of approximate gene duplication. We proposed a refined method based on the same NML models for the segmentation of DNA sequences for uncovering gene duplications [12]. Several analysis tasks in genomic sequences involve preliminary segmentation or clustering of the data, which can be performed by a number of techniques, based on various similarity measures. The process of sequence matching is used for solving the problem of uncovering gene duplications with the help of a preliminary segmentation of a complex DNA locus, known to have evolved through a series of duplications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of algorithms for maximum likelihood estimation of Spatial GLM models

In spatial generalized linear mixed models, spatial correlation is assumed by adding normal latent variables to the model. In these models because of the non-Gaussian spatial response and the presence of latent variables the likelihood function cannot usually be given in a closed form, thus the maximum likelihood approach is very challenging. The main purpose of this paper is to introduce two n...

متن کامل

Modified Maximum Likelihood Estimation in First-Order Autoregressive Moving Average Models with some Non-Normal Residuals

When modeling time series data using autoregressive-moving average processes, it is a common practice to presume that the residuals are normally distributed. However, sometimes we encounter non-normal residuals and asymmetry of data marginal distribution. Despite widespread use of pure autoregressive processes for modeling non-normal time series, the autoregressive-moving average models have le...

متن کامل

A Bayesian Nominal Regression Model with Random Effects for Analysing Tehran Labor Force Survey Data

Large survey data are often accompanied by sampling weights that reflect the inequality probabilities for selecting samples in complex sampling. Sampling weights act as an expansion factor that, by scaling the subjects, turns the sample into a representative of the community. The quasi-maximum likelihood method is one of the approaches for considering sampling weights in the frequentist framewo...

متن کامل

QUASI-MAXIMUM LIKELIHOOD ESTIMATION FOR A CLASS OF CONTINUOUS-TIME LONG-MEMORY PROCESSES By Henghsiu Tsai and K. S. Chan Academia Sinica and University of Iowa

Tsai and Chan (2003) has recently introduced the Continuous-time AutoRegressive Fractionally Integrated Moving-Average (CARFIMA) models useful for studying long-memory data. We consider the estimation of the CARFIMA models with discrete-time data by maximizing the Whittle likelihood. We show that the quasimaximum likelihood estimator is asymptotically normal and efficient. Finite-sample propert...

متن کامل

Comparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds

Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008